197 research outputs found
Optimal Strategies in Infinite-state Stochastic Reachability Games
We consider perfect-information reachability stochastic games for 2 players
on infinite graphs. We identify a subclass of such games, and prove two
interesting properties of it: first, Player Max always has optimal strategies
in games from this subclass, and second, these games are strongly determined.
The subclass is defined by the property that the set of all values can only
have one accumulation point -- 0. Our results nicely mirror recent results for
finitely-branching games, where, on the contrary, Player Min always has optimal
strategies. However, our proof methods are substantially different, because the
roles of the players are not symmetric. We also do not restrict the branching
of the games. Finally, we apply our results in the context of recently studied
One-Counter stochastic games
Minimizing Running Costs in Consumption Systems
A standard approach to optimizing long-run running costs of discrete systems
is based on minimizing the mean-payoff, i.e., the long-run average amount of
resources ("energy") consumed per transition. However, this approach inherently
assumes that the energy source has an unbounded capacity, which is not always
realistic. For example, an autonomous robotic device has a battery of finite
capacity that has to be recharged periodically, and the total amount of energy
consumed between two successive charging cycles is bounded by the capacity.
Hence, a controller minimizing the mean-payoff must obey this restriction. In
this paper we study the controller synthesis problem for consumption systems
with a finite battery capacity, where the task of the controller is to minimize
the mean-payoff while preserving the functionality of the system encoded by a
given linear-time property. We show that an optimal controller always exists,
and it may either need only finite memory or require infinite memory (it is
decidable in polynomial time which of the two cases holds). Further, we show
how to compute an effective description of an optimal controller in polynomial
time. Finally, we consider the limit values achievable by larger and larger
battery capacity, show that these values are computable in polynomial time, and
we also analyze the corresponding rate of convergence. To the best of our
knowledge, these are the first results about optimizing the long-run running
costs in systems with bounded energy stores.Comment: 32 pages, corrections of typos and minor omission
Tableaux for Policy Synthesis for MDPs with PCTL* Constraints
Markov decision processes (MDPs) are the standard formalism for modelling
sequential decision making in stochastic environments. Policy synthesis
addresses the problem of how to control or limit the decisions an agent makes
so that a given specification is met. In this paper we consider PCTL*, the
probabilistic counterpart of CTL*, as the specification language. Because in
general the policy synthesis problem for PCTL* is undecidable, we restrict to
policies whose execution history memory is finitely bounded a priori.
Surprisingly, no algorithm for policy synthesis for this natural and
expressive framework has been developed so far. We close this gap and describe
a tableau-based algorithm that, given an MDP and a PCTL* specification, derives
in a non-deterministic way a system of (possibly nonlinear) equalities and
inequalities. The solutions of this system, if any, describe the desired
(stochastic) policies.
Our main result in this paper is the correctness of our method, i.e.,
soundness, completeness and termination.Comment: This is a long version of a conference paper published at TABLEAUX
2017. It contains proofs of the main results and fixes a bug. See the
footnote on page 1 for detail
Analyzing probabilistic pushdown automata
The paper gives a summary of the existing results about algorithmic analysis of probabilistic pushdown automata and their subclasses.V článku je podán přehled známých výsledků o pravděpodobnostních zásobníkových automatech a některých jejich podtřídách
Value Iteration for Long-run Average Reward in Markov Decision Processes
Markov decision processes (MDPs) are standard models for probabilistic
systems with non-deterministic behaviours. Long-run average rewards provide a
mathematically elegant formalism for expressing long term performance. Value
iteration (VI) is one of the simplest and most efficient algorithmic approaches
to MDPs with other properties, such as reachability objectives. Unfortunately,
a naive extension of VI does not work for MDPs with long-run average rewards,
as there is no known stopping criterion. In this work our contributions are
threefold. (1) We refute a conjecture related to stopping criteria for MDPs
with long-run average rewards. (2) We present two practical algorithms for MDPs
with long-run average rewards based on VI. First, we show that a combination of
applying VI locally for each maximal end-component (MEC) and VI for
reachability objectives can provide approximation guarantees. Second, extending
the above approach with a simulation-guided on-demand variant of VI, we present
an anytime algorithm that is able to deal with very large models. (3) Finally,
we present experimental results showing that our methods significantly
outperform the standard approaches on several benchmarks
Optimizing Performance of Continuous-Time Stochastic Systems using Timeout Synthesis
We consider parametric version of fixed-delay continuous-time Markov chains
(or equivalently deterministic and stochastic Petri nets, DSPN) where
fixed-delay transitions are specified by parameters, rather than concrete
values. Our goal is to synthesize values of these parameters that, for a given
cost function, minimise expected total cost incurred before reaching a given
set of target states. We show that under mild assumptions, optimal values of
parameters can be effectively approximated using translation to a Markov
decision process (MDP) whose actions correspond to discretized values of these
parameters
Approximating the Termination Value of One-Counter MDPs and Stochastic Games
One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs)
are 1-player, and 2-player turn-based zero-sum, stochastic games played on the
transition graph of classic one-counter automata (equivalently, pushdown
automata with a 1-letter stack alphabet). A key objective for the analysis and
verification of these games is the termination objective, where the players aim
to maximize (minimize, respectively) the probability of hitting counter value
0, starting at a given control state and given counter value. Recently, we
studied qualitative decision problems ("is the optimal termination value = 1?")
for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and
coNP, respectively). However, quantitative decision and approximation problems
("is the optimal termination value ? p", or "approximate the termination value
within epsilon") are far more challenging. This is so in part because optimal
strategies may not exist, and because even when they do exist they can have a
highly non-trivial structure. It thus remained open even whether any of these
quantitative termination problems are computable. In this paper we show that
all quantitative approximation problems for the termination value for OC-MDPs
and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon >
0, we can compute a value v that approximates the value of the OC-SSG
termination game within additive error epsilon, and furthermore we can compute
epsilon-optimal strategies for both players in the game. A key ingredient in
our proofs is a subtle martingale, derived from solving certain LPs that we can
associate with a maximizing OC-MDP. An application of Azuma's inequality on
these martingales yields a computable bound for the "wealth" at which a "rich
person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011,
invited for submission to Information and Computatio
Zero-Reachability in Probabilistic Multi-Counter Automata
We study the qualitative and quantitative zero-reachability problem in
probabilistic multi-counter systems. We identify the undecidable variants of
the problems, and then we concentrate on the remaining two cases. In the first
case, when we are interested in the probability of all runs that visit zero in
some counter, we show that the qualitative zero-reachability is decidable in
time which is polynomial in the size of a given pMC and doubly exponential in
the number of counters. Further, we show that the probability of all
zero-reaching runs can be effectively approximated up to an arbitrarily small
given error epsilon > 0 in time which is polynomial in log(epsilon),
exponential in the size of a given pMC, and doubly exponential in the number of
counters. In the second case, we are interested in the probability of all runs
that visit zero in some counter different from the last counter. Here we show
that the qualitative zero-reachability is decidable and SquareRootSum-hard, and
the probability of all zero-reaching runs can be effectively approximated up to
an arbitrarily small given error epsilon > 0 (these result applies to pMC
satisfying a suitable technical condition that can be verified in polynomial
time). The proof techniques invented in the second case allow to construct
counterexamples for some classical results about ergodicity in stochastic Petri
nets.Comment: 20 page
Mean-Payoff Optimization in Continuous-Time Markov Chains with Parametric Alarms
Continuous-time Markov chains with alarms (ACTMCs) allow for alarm events
that can be non-exponentially distributed. Within parametric ACTMCs, the
parameters of alarm-event distributions are not given explicitly and can be
subject of parameter synthesis. An algorithm solving the -optimal
parameter synthesis problem for parametric ACTMCs with long-run average
optimization objectives is presented. Our approach is based on reduction of the
problem to finding long-run average optimal strategies in semi-Markov decision
processes (semi-MDPs) and sufficient discretization of parameter (i.e., action)
space. Since the set of actions in the discretized semi-MDP can be very large,
a straightforward approach based on explicit action-space construction fails to
solve even simple instances of the problem. The presented algorithm uses an
enhanced policy iteration on symbolic representations of the action space. The
soundness of the algorithm is established for parametric ACTMCs with
alarm-event distributions satisfying four mild assumptions that are shown to
hold for uniform, Dirac and Weibull distributions in particular, but are
satisfied for many other distributions as well. An experimental implementation
shows that the symbolic technique substantially improves the efficiency of the
synthesis algorithm and allows to solve instances of realistic size.Comment: This article is a full version of a paper accepted to the Conference
on Quantitative Evaluation of SysTems (QEST) 201
Weak MSO+U with Path Quantifiers over Infinite Trees
This paper shows that over infinite trees, satisfiability is decidable for
weak monadic second-order logic extended by the unbounding quantifier U and
quantification over infinite paths. The proof is by reduction to emptiness for
a certain automaton model, while emptiness for the automaton model is decided
using profinite trees.Comment: version of an ICALP 2014 paper with appendice
- …